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1  Summary  and  Conclusions 

1.  Responses  of  single-units  and  unit  clusters  were  recorded  in  the  ferret  primary  auditory 
cortex  (AI)  using  broadband  complex  dynamic  spectra.  Previous  work  (Kowalski  et  al 
1995)  demonstrated  that  simpler  spectra  consisting  of  single  moving  ripples  (i.e.,  sinu¬ 
soidally  modulated  spectral  profiles  that  travel  at  a  constant  velocity  along  the  logarithmic 
frequency  axis)  could  be  used  effectively  to  characterize  the  response  fields  and  transfer 
functions  of  AI  cells. 

2.  An  arbitrary  complex  dynamic  spectral  profile  can  be  conceptually  thought  of  as  being 
composed  of  a  weighted  sum  of  moving  ripple  spectra.  Such  a  decompositon  can  be 
computed  from  a  two-dimensional  spectro-temporal  Fourier  transform  of  the  dynamic 
spectral  profile  with  moving  ripples  as  the  basis  function. 

3.  Therefore,  if  AI  units  were  essentially  linear  satisfying  the  superposition  principle,  then 
their  responses  to  arbitrary  dynamic  spectra  could  be  predicted  from  the  responses  to 
single  moving  ripples,  i.e.,  from  the  units  response  fields  and  transfer  functions. 

4.  This  conjecture  was  tested  and  confirmed  with  data  from  293  combinations  of  moving 
ripples,  involving  complex  spectra  composed  of  up  to  15  moving  ripples  of  different  ripple 
frequencies  and  velocities.  For  each  case,  response  predictions  based  on  the  unit  transfer 
functions  were  compared  to  measured  responses.  The  correlation  between  predicted  and 
measured  responses  was  found  to  be  consistently  high  (84%  with  p  >  0.6). 

5.  The  distribution  of  response  parameters  suggest  tha  AI  cells  may  encode  the  profile  of 
a  dynamic  spectrum  by  performing  a  multiscale  spectro-temporal  decomposition  of  the 
dynamic  spectral  profile  in  a  largely  linear  manner. 

2  Introduction 

Acoustic  stimuli  with  broadband  dynamic  spectra  evoke  strong  and  relatively  sustained  re¬ 
sponses  in  neurons  of  the  primary  auditory  cortex  (AI)  (de  Ribaupierre  et  al.  1972,  Eggermont 
1994,  Kowalski  et  al.  1995).  The  response  patterns  reflect  details  of  both  the  spectral  shape 
and  its  changes  in  time.  To  characterize  these  neurons  or  units,  elementary  broad-band  spectra 
with  envelopes  sinusoidally  modulated  on  a  logarithmic  axis  (ripples)  were  presented  over  a 
wide  range  of  parameters  such  as  ripple  frequencies,  phases,  and  velocities  (Kowalski  et  al. 
1995).  A  typical  and  most  prominent  feature  of  the  responses  is  the  synchronized  component 
which  tracks  the  periodicity  of  the  stimulus  envelope.  The  amplitude  and  phase  of  this  com¬ 
ponent  could  be  measured  from  period  histograms  of  the  neural  responses  and  plotted  against 
different  stimulus  parameters,  thus  obtaining  a  variety  of  transfer  functions.  For  example,  the 
response  component  as  a  function  of  ripple  frequency  is  the  ripple  transfer  function,  whose 
inverse  Fourier  transform  is  the  response  field  of  the  unit  {JZ!F).  Similarly,  the  response  as  a 
function  of  ripple  velocity  gives  the  temporal  transfer  function  and  its  inverse  transform,  the 
temporal  impulse  response  {171). 

Implicit  in  the  use  of  TZtF  and  XTZ  to  describe  AI  responses  is  an  assumption  of  linearity. 
That  is,  these  “linear  systems  response  measures”  meaningfully  characterize  the  response  of 
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a  unit  to  the  spectral  shape  on  the  one  hand  and  its  dynamics  on  the  other.  This  is  in  fact 
the  case  for  the  “KT  in  that  AI  responses  to  input  spectra  such  as  stationary  ripples  obey  the 
superposition  principle,  and  hence  can  be  linearly  weighted  and  summed  to  predict  the  responses 
to  arbitrary  combinations  of  stationary  ripples  (Shamma  and  Versnel  1995).  This  result  also 
applies  to  moving  ripples  since  the  TIT  shape  (to  within  a  scaling  factor)  is  independent  of  the 
ripple  velocities  (Kowalski  et  al.  1995). 

However,  the  relevance  of  the  TR  in  characterizing  AI  unit  responses  is  uncertain  since  the 
linearity  of  the  temporal  responses  to  dynamic  spectra  has  not  been  directly  demonstrated.  The 
purpose  of  the  experiments  described  here  is  to  test  directly  whether  the  superposition  principle 
is  operable  for  the  responses  elicited  by  moving  ripples  with  different  velocities.  Specifically, 
our  goal  is  first  to  record  the  responses  of  a  unit  to  single  ripples  moving  at  a  range  of  velocities. 
Then,  we  form  the  superposition  of  a  specific  combination  of  these  responses  (e.g.,  to  a  pair 
or  triplet  of  moving  ripples) ,  and  compare  it  to  the  actual  responses  obtained  from  a  stimulus 
spectrum  composed  of  the  same  ripple  combination.  If  AI  responses  are  linear,  the  two  response 
patterns  must  be  similar,  and  the  RT  and  XR  can  be  used  to  predict  the  responses  of  the  unit 
to  any  arbitrary  broadband  dynamic  spectrum. 


3  METHODS 

3.1  Surgery  and  animal  preparation 

Ferrets  (Mustela  putorius)  were  anesthetized  with  sodium  pentobarbital  (40  mg/kg)  and  main¬ 
tained  in  an  areflexic  state  throughout  the  experiment  by  a  continuous  intravenous  infusion  of 
pentobarbital  and  lactated  Ringer  solution,  mixed  with  dextrose  to  compensate  for  their  high 
metabolic  rate.  The  ectosylvian  gyrus,  which  includes  the  primary  auditory  cortex,  was  exposed 
by  craniotomy  and  the  dura  was  reflected.  The  contralateral  ear  canal  was  exposed,  cleaned 
and  partly  resected,  and  subsequently  a  cone-shaped  speculum  containing  a  Sony  MDR-E464 
miniature  speaker  was  sutured  to  the  meatal  stump.  For  details  on  the  surgery  see  Shamma 
et  al.  (1993).  The  protocols  for  anesthesia  and  animal  care  were  approved  by  the  University’s 
institutional  animal  care  and  use  committee. 

3.2  Acoustic  stimuli 

Various  auditory  stimuli  were  used:  pure  tones  (single  tone  bursts,  200  ms  duration,  10  ms 
triangular  rise-  and  fall-times),  broad-band  complex  sounds  (single  ripples),  and  linear  combi¬ 
nations  of  these  complex  sounds  (multiple  ripples).  These  are  briefly  reviewed  below;  a  more 
extensive  description  can  be  found  in  (Kowalski  et  al.  1995).  All  complex  stimuli  were  com¬ 
puter  synthesized,  gated,  and  then  fed  through  an  equalizer  into  the  earphone.  Calibration  of 
the  sound  delivery  system  (up  to  20  kHz)  was  performed  in  situ  using  a  1/8-in.  Briiel  Sz  Kjaer 
probe  microphone  (type  4170).  The  microphone  was  inserted  into  the  meatus  through  the  wall 
of  the  speculum  to  within  5  mm  of  the  tympanic  membrane.  The  speculum  and  microphone 
setup  resembles  closely  that  suggested  by  Evans  (1979). 

Ripples  were  made  up  of  101  tones  equally  spaced  along  a  logarithmic  frequency  axis  and 
spanning  4.32  or  5  octaves  (e.g.,  1-20  kHz,  0.5-16  kHz  or  0.25-8  kHz),  such  that  the  response 
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area  of  the  cell  being  tested  lay  within  the  stimulus’  spectrum.  The  amplitude  of  each  of  the 
tones  was  chosen  so  that  the  spectral  envelope  of  the  resulting  broad-band  stimulus  forms  a 
sinusoid  (a  ripple)  on  a  linear  amplitude  scale,  with  the  amplitude  usually  set  at  90  —  100% 
modulation.  Schematically,  then,  the  envelope  profile  is  given  by: 

S{x)  =  1  -t-  AA  ■  sin(27r  •  0  ■  a;  -t-  $),  (1) 

where  AA  is  0.9  or  1,  x  is  the  logarithmic  frequency  axis  (in  octaves)  defined  as:  x  =  log2{^) 
where  Fq  is  the  lower  edge  of  the  spectrum,  i.e.  1  kHz,  0.5  kHz  or  0.25  kHz,  and  F  is  frequency. 
$  is  an  arbitrarily  chosen  phase  factor.  Note  that  when  A  A  is  zero,  the  resulting  stimulus  is  a 
fiat  spectrum. 

A  moving  ripple  spectrum  can  be  similarly  characterized  by  its  ripple  frequency  Q  (in 
cycles/octave),  ripple  phase  $  (in  radians),  and  ripple  velocity  u  (in  Hertz). 

^(a;,  t)  =  1  -I-  AA  -  sin(27r(a;t  +  f2a;)  -I-  $)  .  (2) 

In  these  conventions,  a  positive  value  for  oj  corresponds  to  a  ripple  whose  envelope  travels 
towards  low  frequencies.  Moving  ripple  stimuli  lasted  up  to  1.7  seconds  with  similar  rise/fall 
times.  At  the  onset  of  the  presentation,  the  ripple  spectrum  was  initiated  in  a  sine  phase 
(defined  as  $  =  0°)  relative  to  the  low  frequency  edge  of  the  spectrum.  The  ripple  began 
immediately  moving  to  the  left  at  the  specified  constant  velocity,  although  the  stimulus  was 
only  acoustically  turned  on  50  ms  after  the  onset  of  motion.  The  overall  level  of  a  single-ripple 
stimulus  was  calculated  from  the  level  of  a  single  frequency  component  at  Li  dB  SPL.  Thus,  an 
Li  level  flat  ripple  is  composed  of  101  components,  each  at  Li  -  10  log(lOl)  «  Li  -  20  dB.  The 
overall  stimulus  level  was  chosen  on  the  basis  of  the  threshold  at  BF,  typically  Li  was  set  about 
10  dB  above  threshold.  High  levels  {Li  >  65  dB  SPL)  were  avoided  to  ensure  the  linearity 
of  our  acoustic  delivery  system.  For  multiple-ripple  stimuli,  100%  modulation  was  defined  by 
rescaling  the  complex  profile  so  that  its  most  negative  peak  just  touches  zero,  i.e.,  analogous 
to  setting  A  A  =  1  in  the  above  equation. 

This  study  concentrated  on  the  responses  to  combinations  of  moving  ripples.  These  were 
generated  by  first  specifying  the  ripple  frequency,  initial  phase,  and  velocity  of  each  moving 
ripple  in  the  combination.  Then,  the  resulting  compound  waveform  due  to  the  superposition 
of  these  moving  ripples  was  computed  and  used  to  shape  the  time-varying  envelope  of  the 
complex  stimulus.  For  example.  Figure  1  illustrates  the  spectral  envelopes  that  result  from 
adding  different  ripple  combinations.  The  spectrogram  to  the  left  in  Fig.lA  illustrates  the 
envelope  of  the  spectrum  that  results  from  adding  a  ripple  with  Q.  =  0.8  cycles/octave  moving 
at  4  Hz,  and  starting  at  phase  |,  to  another  with  =  0.8  cycles/octave,  velocity  of  8  Hz, 
and  starting  at  tt.  At  any  instant,  the  spectral  profile  appears  sinusoidal  as  a  function  of  the 
logarithmic  frequency  axis  with  Q  =  0.8  cycles/octave  as  demonstrated  by  the  cross  section 
plotted  above  the  spectrogram.  The  amplitude  in  time  of  any  component  in  the  spectrum  is 
shown  by  the  cross  section  to  the  right,  which  is  given  by  the  sum  of  a  4  and  8  Hz  sinusoids. 
Fig. IB  displays  a  complex  spectrogram  resulting  from  the  addition  of  5  ripples  at  the  velocities 
and  random  initial  phases  indicated  in  the  figure  legend. 
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3.3  Recordings 

Action  potentials  from  single  units  were  recorded  using  glass-insulated  tungsten  microelectrodes 
with  5-6  Mfi  tip  impedances.  Neural  signals  were  led  through  a  window  discriminator  and  the 
time  of  spike  occurrence  relative  to  stimulus  delivery  was  stored  on  a  computer.  The  computer 
also  controlled  stimulus  delivery,  and  created  various  raster  displays  of  the  responses.  Each 
single-ripple  stimulus  combination  was  presented  40  times  in  a  test,  and  multiple-ripple  stimuli 
were  usually  presented  150  times. 

A  single  unit  was  visually  identified  and  isolated  using  a  windowing  discriminator.  Clusters 
are  defined  to  be  a  group  of  2-5  single  units  distinguished  by  spike  amplitude.  In  each  animal, 
electrode  penetrations  were  made  perpendicular  to  the  cortical  surface.  Within  a  penetration, 
cells  were  typically  isolated  at  depths  of  350-600  /xm  corresponding  to  cortical  layers  III  and 
IV  (Shamma  et  al.  1993). 

3.4  Data  analysis  for  tonal  stimuli 

For  each  cell,  a  frequency  response  curve  was  measured  with  up  to  1/8  octave  resolution  at  low 
intensity.  The  best  frequency  (BF)  was  determined  from  this  response  curve  as  the  frequency 
which  evoked  the  best  response  as  measured  by  counting  the  spikes  evoked  by  the  tone.  Thus, 
BF  is  the  frequency  that  elicits  a  response  at  the  lowest  possible  threshold.  The  rate-level 
function  at  BF  was  measured  at  a  range  from  35  to  85  dB  SPL  in  order  to  determine  the  cell’s 
response  threshold  and  the  non-monotonicity.  The  criteria  were  10%  of  maximum  response  and 
a  decrease  of  25%  with  increase  of  intensity,  respectively. 

3.5  Data  analysis  for  ripple  stimuli 

3.5.1  Single  stationary  ripples 

Each  unit  was  initially  tested  with  stationary  single  ripple  stimuli  over  the  range  0-2  cy¬ 
cles/octave  in  steps  of  0.4  cycles/octave.  At  each  ripple  frequency,  the  amplitude  and  phase 
of  the  primary  response  component  synchronized  to  the  ripple  frequency  were  determined  (as 
described  in  detail  in  Shamma  et  al.  1995).  The  transfer  function  was  then  inverse  Fourier 
transformed  to  compute  the  unit’s  TZT.  The  transfer  function  usually  peaks  around  a  charac¬ 
teristic  ripple  frequency,  which  will  be  referred  to  as  fio- 

3.5.2  Single  moving  ripples 

Single  moving  ripples  were  presented  in  one  of  two  ways:  (1)  over  a  range  of  velocities  at  a 
specified  ripple  frequency  (usually  at  flo)  to  measure  the  “temporal”  transfer  function,  or  (2) 
over  a  range  of  ripple  frequencies  at  a  specific  velocity  (usually  at  Um  where  the  “temporal” 
transfer  function  is  maximum)  to  measure  the  “ripple”  transfer  function.  For  either  test,  the 
strength  of  the  phase-locked  responses  was  assessed  from  period  histograms  with  a  time-base 
of  16  or  32  bins  constructed  at  each  a;  or  as  described  in  detail  in  (Kowalski  et  al.  1995). 
The  amplitude  and  phase  of  the  response  component  synchronized  to  each  u  or  Q,  was  then 
derived  from  the  first  coefficient  of  a  16  or  32  point  FFT  of  the  histogram  {ACi{(jo)  or  AC'i(fl)). 
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The  amplitude  of  this  component  was  then  weighted  by  the  total  rms  value  of  the  response  and 
used  to  construct  the  temporal  or  ripple  transfer  function  of  the  unit  {Tq^{(jo)  and 
The  temporal  transfer  function  is  therefore  given  by; 


ACi{uj) 


(3) 


where  1^(171(0;)  |  is  the  magnitude  of  the  Fourier  component  of  the  period  histogram  response. 
In  general  (u)  can  also  be  written  as: 


TqM  =  ,  where  j  =  . 


(4) 


Figure  2  illustrates  the  magnitude  |Tn^(a;)|  and  the  unwrapped  phase  of  the  transfer 

function  Tq^{uj)  (top  panels).  In  almost  all  units  recorded,  the  phase  function  could  be  fit  well 
by  a  straight  line  (Kowalski  et  al.  1995).  The  slope  of  the  line  reflects  the  absolute 

time  delay  (r^)  between  stimulus  and  responses.  Note  that  this  estimate  is  contaminated  by 
the  additional  delay  due  to  the  arbitrary  choice  of  the  starting  time  of  the  period  histogram. 
In  all  cases  shown  in  this  paper,  the  period  histograms  are  constructed  from  responses  starting 
at  t  =  120  ms,  and  hence  the  absolute  time  delay  can  be  computed  from: 


Td  =  slope  (radian/Hz)  -  0.12  seconds.  (5) 

Another  parameter  of  the  linear  fit  of  the  phase  function  is  its  intercept  along  the  ordinate, 
^no(O))  which  is  an  additional  constant  phase  shift  in  the  period  histogram  relative  to  the 
stimulus  ripple  phase.  For  more  details  of  this  analysis,  see  Kowalski  et  al.  (1995). 

The  ripple  transfer  function  (f2)  can  be  similarly  written  as; 

,  where  j  =  (6) 

Figure  3  illustrates  the  magnitude  |T^^„,(f^)|  and  the  unwrapped  phase  $(^„,(f2)  of  the  transfer 
function  respectively  (top  panels).  A  straight  line  fit  to  the  phase  function,  *i>w,„(i2) 

can  be  described  by: 

^<.r„(f^)  =a;™-O  +  $,^(0)  ,  (7) 

where  is  the  slope  of  the  line,  and  *i’a>,„(0)  is  its  intercept.  The  parameter  Xm  reflects  the 
location  (in  octaves)  of  the  response  field  relative  to  the  left  edge  of  the  ripple.  The  distance 
from  the  center  of  the  RF  envelope  to  the  left  edge  of  the  spectrum  is  given  hy  k  ■  ^  +  Xm, 
where  A  is  the  step  size  of  the  ripple  frequencies  tested,  and  k  is  an  integer  >  1  (Shamma  et 
al.  1995).  The  intercept  $(j„,(0)  is  an  additional  constant  phase  shift  in  the  period  histogram 
relative  to  the  stimulus  ripple  phase. 

In  most  units,  the  transfer  functions  were  measured  only  at  one  overall  stimulus  level  which 
elicited  a  relatively  strong  response.  Previous  studies  have  determined  that  responses  to  sta¬ 
tionary  and  traveling  stimuli  are  not  critically  dependent  on  the  base  intensity  (Shamma  et  al. 
1995;  Kowalski  et  al.  1995). 

The  ripple  and  temporal  transfer  functions  were  inverse  Fourier  transformed  to  obtain  the 
corresponding  TZT  and  XTZ  functions,  shown  respectively  in  the  bottom  panels  of  Figs. 2  and 
3.  In  either  case,  the  phase  function  was  modified  to  remove  the  inappropriate  constant  phase- 
shifts  (Kowalski  et  al.  1995). 
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3.5.3  Combinations  of  moving  ripples 

Responses  to  stimuli  composed  of  multiple  moving  ripples  were  recorded  and  compared  to 
predictions  made  from  the  temporal  and  ripple  transfer  functions  of  the  cell.  The  ripple- 
combination  stimuli  fell  into  one  of  the  following  categories:  (1)  2-4  moving  ripple  combinations, 
all  with  the  same  ripple  frequency,  but  with  different  velocities;  (2)  Simulated  FM-tone  spec¬ 
trum  composed  of  5  moving  ripples  with  ripple  frequencies,  initial  ripple  phases,  and  velocities 
chosen  such  that  the  composite  spectrum  resembles  a  single  sharp  peak  sweeping  downwards 
at  a  constant  velocity;  (3)  Simulated  temporal  noise  composed  of  5-15  moving  ripples  with  the 
same  ripple  frequency,  but  different  velocities  and  random  initial  phases;  (4)  Simulated  ripple 
noise  composed  of  5  moving  ripples  with  the  same  velocity,  but  different  ripple  frequencies  and 
random  initial  phases. 

Measured  responses  from  100  repetitions  of  the  stimulus  were  analyzed  using  16-  or  32- 
bin  period  histograms  with  the  period  being  the  fundamental  period  of  the  stimulus.  If  the 
maximum  spike  count  did  not  exceed  15  spikes  in  any  of  the  bins,  the  response  was  considered 
weak  and  not  considered  for  any  further  analysis. 

Predicted  responses  of  a  unit  were  computed  from  its  responses  to  single  moving  ripples, 
i.e.,  its  temporal  and  ripple  transfer  functions,  and  their  inverses  the  TR  and  RR.  For  those 
stimuli  in  which  only  a  single  ripple  frequency  €t  was  used  (categories  1  and  3  above) ,  only  the 
temporal  transfer  function  at  that  Cl  was  needed  since  the  predicted  curve  could  be  directly 
extracted  by  superposition  of  the  normalized  period  histograms  measured  at  the  appropriate 
velocities. 

In  general,  however,  predictions  for  all  stimuli  could  be  derived  from  the  RT  and  IR  of 
a  unit  as  illustrated  in  Figure  4.  In  Fig.  4A,  the  envelope  of  a  3  ripple-combination  stimulus 
is  depicted  in  the  form  of  a  spectrogram.  The  stimulus  (and  hence  all  responses)  is  periodic 
with  a  fundamental  period  of  250  ms.  The  RT  of  the  cell  is  computed  from  the  inverse 
Fourier  transform  of  the  phase-corrected  ripple  transfer  function  as  discussed  earlier,  and  is 
shown  in  Fig.4B  oriented  (vertically)  along  the  frequency  (tonotopic)  axis,  with  BF  Ri  3  kHz. 
Fig.4C  illustrates  the  product  of  the  RF  with  the  stimulus  profile  as  a  function  of  time,  which 
represents  the  response  of  the  unit  due  to  the  RF  alone.  This  (periodic)  function  is  then 
modified  by  the  dynamic  response  properties  of  the  cell  through  a  convolution  with  the  XR 
shown  in  Fig.4D.  One  fundamental  period  of  the  final  predicted  response  of  the  unit  is  illustrated 
in  Fig.4E  (solid  line),  superimposed  (with  an  arbitrary  scale)  against  the  measured  response  of 
the  cell  to  the  stimulus  (dashed  line).  Note  that,  while  the  predicted  curve  fluctuates  equally 
around  zero  (little  or  no  sustained  responses  are  usually  observed),  the  measured  response  curve 
is  always  half-wave  rectified  (and  sometimes  also  saturated). 

To  assess  objectively  the  similarity  between  the  two  functions,  a  correlation  coefficient  is 
defined  as  : 

St  '^meas  (t)  •  Tjyredif)  /q\ 

p  —  — ,  (ol 

where  Tmeasit)  is  the  measured  spike  count  curve  and  rpred{t)  is  the  predicted  response  curve. 
Since  Tmeasit)  is  half-wave  rectified,  the  comparison  would  be  more  accurate  if  the  correlation 
coefficient  were  computed  with  a  half-wave  rectified  rpred{t)-  In  this  case,  uncorrelated  responses 
have  coefficients  of  about  0.35  (rather  than  zero  for  non-rectified  functions).  A  histogram  of 
the  correlation  coefficients  from  all  units/clusters  tested  is  compiled  based  on  the  diflferent 


8 


categories  of  tests.  Sometimes,  several  stimuli  of  the  same  type  were  presented  to  a  cell,  e.g., 
various  combinations  of  moving  ripple  pairs.  In  these  cases,  the  correlation  coefficient  indicated 
is  the  average  obtained  from  all  such  tests. 

4  RESULTS 

293  combination  stimuli  were  presented  to  51  single  units  and  clusters  in  5  animals  (35  single 
units,  16  clusters).  All  examples  shown  in  the  figures  are  responses  of  single  units,  although 
those  obtained  from  clusters  were  very  similar  in  character.  Correlation  coefficients  from  the 
two  populations  are  displayed  separately  in  the  histograms. 

4.1  Responses  to  simple  combinations  of  ripples:  pairs,  triplets 
and  quadruplets 

The  response  patterns  and  their  predictions  are  illustrated  in  Figure  5  for  three  different  units 
stimulated  by  different  two-ripple  combinations.  Spectrograms  of  the  stimulus  envelopes  are 
shown  in  column  A.  The  TI.T  and  TR.  of  each  unit  are  shown  in  columns  B  and  D,  respectively. 
Column  C  illustrates  the  predicted  response  of  each  unit  due  to  the  R-T  alone,  while  the  final 
predicted  and  measured  responses  are  shown  in  column  E. 

The  ability  of  the  TR.  to  capture  and  predict  the  dynamics  of  the  responses  is  best  seen  by 
comparing  panels  C  and  E  for  each  unit.  For  example,  in  the  middle  unit,  the  responses  due  to 
the  RT  alone  (panel  C)  become  almost  completely  inverted  in  panel  E  (dashed  curve).  This 
inversion  is  due  to  the  unusual  “inverted”  form  of  the  XR,  which  is  described  in  detail  in  the 
companion  paper  (Kowalski  et  al.  1995).  More  typical  XR‘‘s  as  in  the  top  and  bottom  units 
produce  similar,  though  not  as  dramatic,  transformations  of  the  waveforms  between  panels 
C  and  E,  such  as  additional  absolute  delays  (reflecting  the  effects  of  r^),  and  changes  in  the 
relative  heights  of  the  response  peaks,  most  notably  in  the  bottom  unit  where  in  the  final 
response  (panel  E)  the  two  peaks  become  comparable  in  size. 

Figure  6  shows  responses  from  the  same  unit  as  in  Fig.5  (top)  to  three  stimuli  with  increas¬ 
ing  numbers  of  moving  ripples.  The  responses  generally  exhibit  the  same  features  as  before, 
especially  the  waveform  transformations  due  to  the  XR  (panels  C  vs  E).  Furthermore,  the  pre¬ 
dicted  responses  (solid  lines  in  column  E)  match  reasonably  well  the  outlines  and  some  of  the 
details  of  the  measured  responses  (p  >  0.78).  For  instance,  time  of  occurrence  of  the  largest 
peak  in  the  response  varies  depending  on  the  stimulus  in  a  similar  manner  for  both  predicted 
and  measured  responses.  Another  example  of  such  a  matching  co- variation  of  the  responses  for 
different  stimuli  is  shown  in  Figure  7A.  In  addition,  the  XR  of  this  unit  (column  D)  is  such 
that  it  induces  a  significant  transformation  of  the  responses  similar  to  that  seen  earlier  for  the 
middle  unit  in  Fig.5. 

Note,  however,  that  in  both  examples  of  Fig.5  and  6,  predicted  responses  may  be  delayed 
or  advanced  relative  to  the  measured  curve.  Such  shifts  are  most  likely  due  to  errors  in  the 
measurements  of  the  slope  of  the  temporal  phase  function  $n^(a;)  which  affect  the  -  the 
absolute  time-delay  of  the  XR. 

In  general,  the  most  prominent  disparity  between  predicted  and  measured  response  curves 
in  all  cases  is  due  to  the  half-wave  rectification  of  the  spike  rates  (Kowalski  et  al.  1995).  The 
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effects  of  this  instantaneous  nonlinearity  can  be  readily  understood  by  comparing  the  responses 
to  the  stimulus  and  its  inverted  envelope  shown  in  Fig.7B.  In  the  top  row,  the  final  response  has 
a  single  large  peak  that  occurs  at  170  ms.  In  the  bottom  row,  the  stimulus  envelope  has  been 
inverted.  The  response  also  becomes  inverted,  thus  revealing  the  previously  half-wave  rectified 
fluctuations.  Given  the  two  responses,  one  can  construct  a  ’’linearized”,  or  non-rectified  version 
(dotted  curve  in  the  bottom  panel  E)  that  matches  better  the  linear  predictions  (solid  lines). 

A  histogram  of  the  correlation  coefficients  generated  by  the  responses  to  stimuli  with  two, 
three  or  four  ripples  and  their  predictions  are  displayed  in  Figure  8.  Most  responses  to  these 
ripple  stimuli  were  reasonably  predictable  (84%  with  p  >  0.6). 

4.2  Responses  to  complex  combinations  of  ripples 

The  results  shown  in  Figures  9  and  10  demonstrate  further  the  predictive  power  of  the 
and  177  and  hence  the  extent  of  the  linearity  of  the  responses.  Figure  9  shows  examples  of 
responses  and  predictions  to  complex  combinations  of  moving  ripples  for  one  single  unit.  As 
many  as  15  ripples  were  added  together  to  create  these  stimuli,  resulting  in  complex  envelope 
profiles  as  seen  in  the  spectrograms  (column  A).  Stimulus  at  the  top  was  an  FM-like  stimulus, 
mimicking  the  spectrum  of  a  single  traveling  tone.  The  middle  stimulus  is  temporally  noise-like, 
but  spectrally  simple,  whereas  the  last  stimulus  has  the  opposite  character.  Predictions  in  all 
cases  give  significant  correlation  coefficients  (0.88,  0.83,  0.89). 

Figure  10  illustrates  similar  findings  from  additional  complex  stimuli  or  different  units.  Note 
that  for  all  three  examples,  the  impulse  response  (Column  D)  plays  a  key  role  in  that  the  final 
responses  cannot  be  readily  predicted  from  the  77.F  alone.  Instead,  the  waveforms  in  column 
C  are  almost  inverted  by  the  X77  in  the  top  two  examples;  In  the  bottom  example,  the  fast 
temporal  modulations  in  the  input  and  in  panel  C  are  heavily  filtered  out  by  the  J77  to  produce 
finally  a  smoother  response  (panel  E) . 

The  histogram  in  Figure  11  sums  up  the  correlation  coefiicients  found  between  predicted 
and  measured  responses  for  all  units  with  tested  with  complex  ripple  combinations.  As  with 
simple  ripple  combinations,  the  majority  of  correlation  coefficients  are  significantly  large  (89% 
with  p  >  0.6). 

5  DISCUSSION 

5.1  Summary  of  responses  to  moving  ripple  combinations 

The  results  presented  in  this  report  support  the  hypothesis  that  AI  units  responses  to  arbi¬ 
trary  dynamic  spectra  are  reasonably  predictable  once  the  response  to  single  moving  ripples  is 
known,  or  specifically,  the  'R.T  and  177  are  known.  This  was  demonstrated  by  validating  the 
superposition  principle,  that  is  the  responses  to  a  combination  of  moving  ripples  compare  well 
with  those  predicted  from  a  linear  sum  of  the  responses  to  the  individual  constituent  ripples. 
This  was  found  to  hold  for  spectra  composed  of  up  to  15  ripples,  with  various  ripple  frequencies, 
velocities,  and  initial  phases,  and  despite  a  wide  range  of  sources  for  error.  For  instance,  some 
error  is  introduced  by  the  assumption  of  separability  of  the  spectral  and  temporal  dimensions 
of  the  transfer  function  as  discussed  in  Kowalski  et.  al  (1995).  Errors  are  also  inevitable  in 
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computing  the  TIT  and  TR,  where  the  phase  functions  have  to  be  fitted  and  adjusted  to  sub¬ 
tract  the  inappropriate  terms  (as  discussed  in  Methods).  Finally,  measurements  of  the  transfer 
functions  and  of  the  multiple  ripple  stimuli  are  done  sequentially  over  a  relatively  long  period  of 
time  (from  30  to  60  minutes)  during  which  the  state  of  the  animal  is  likely  to  change  somewhat. 

5.2  The  effects  of  response  nonlinearities 

AI  units  exhibit  response  nonlinearities  due  to  threshold  and  saturation  that  cause  their  firing 
rates  to  be  half-wave  rectified,  clipped,  or  exponentially  distorted.  The  first  two  are  well 
recognized  properties  of  nerve  cell  firings  and  their  effects  are  immediately  visible  by  comparing 
the  (linearly)  predicted  response  waveforms  (which  can  go  negative)  to  the  actual  responses 
shown  in  most  examples  in  this  paper.  The  third  effect  is  more  subtle;  it  is  well  illustrated  by 
the  responses  in  Fig. 6  where  the  response  peaks  appear  sharper  (or  having  steeper  skirts)  as  if 
the  predicted  waveform  has  been  exponentially  enhanced.  This  distortion  is  common  in  many 
responses  shown  here  and  in  (Kowalski  et  al.  1995). 

However,  these  largely  instantaneous  nonlinearities  appear  to  act  upon  the  already  generated 
linear  response  pattern.  Therefore,  their  effects  are  relatively  transparent  and  the  underlying 
linearly  predicted  waveform  is  immediately  accessible  as  demonstrated  in  all  examples  in  the 
paper.  Furthermore,  the  information  content  in  the  distorted  response  waveform  remains  intact 
and  hence  “linear”  version  of  the  response  can  be  recovered  easily  in  a  manner  similar  to  that 
discussed  in  Shamma  et  al.  (1995)  for  responses  to  stationary  spectra. 

5.3  Functional  significance  of  AI  temporal  response  properties 

AI  units  respond  to  changes  in  the  spectral  envelope  in  a  substantially  linear  and  temporally 
selective  manner.  They  are  usually  tuned  around  a  specific  rate  between  2  to  16  Hz,  with  an 
approximate  bandwidth  of  3  octaves.  These  response  parameters  are  summarized  for  all  units 
isolated  in  Fig. 6  in  the  companion  paper  (Kowalski  et  al.  1995).  The  functional  implications 
of  the  histogram  distribution,  however,  are  ambiguous.  On  the  one  hand,  AI  units  can  be 
said  to  be  all  tuned  around  an  average  rate  of  8  Hz;  the  scatter  around  this  value  in  the 
histogram  is  then  the  usual  noise  one  expects  in  a  physiological  system.  On  the  other  hand, 
one  may  interpret  the  histogram  as  a  broad  distribution  of  cells  with  transfer  functions  tuned 
to  different  temporal  rates,  all  with  approximately  similar  bandwidths. 

In  the  first  view,  the  8  Hz  tuning  may  be  seen  as  a  general  physiological  limitation  of  cortical 
cell  dynamics,  or  as  an  epi-phenomenon  related  to  the  rhythms  induced  by  cortico-thalamic 
loops  (Eggermont,  1992).  It  may  also  be  that  temporal  tuning  per  se  is  not  important,  but 
rather  that  the  impulse  responses  (XT?.)  act  functionally  as  a  temporal  derivative  that  abolish 
the  sustained  responses  to  stationary  spectra  and  preserve  only  responses  to  dynamic  spectra. 

The  second  interpretation  of  the  histogram  implies  that  AI  units  have  impulse  response 
functions  with  a  range  of  dilations  analogous  to  the  range  of  different  bandwidths  exhibited  by 
the  RT's.  Specifically,  it  is  assumed  in  this  hypothesis  that  for  any  given  RT,  there  are  different 
units  with  a  range  of  XT?.’s,  each  encoding  the  local  dynamics  of  the  spectrum  at  a  different 
time-scale,  i.e.,  there  are  units  exclusively  sensitive  to  slow  modulations  in  the  spectrum,  and 
others  tuned  to  moderate  or  fast  changes.  This  temporal  decomposition  is  analogous  to  the 
multiscale  representation  of  the  shape  of  the  spectrum  produced  by  the  RF's.  Such  an  analysis 
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may  underlie  many  important  perceptual  invariances  such  as  the  ability  to  recognize  speech 
and  melodies  despite  large  changes  in  rate  of  delivery  (Julesz  and  Hirsh  1972),  or  to  perceive 
continuous  music  and  speech  through  gaps,  noise,  and  other  short  duration  interruptions  in 
the  sound  stream.  Furthermore,  the  segregation  into  different  time-scales  such  as  fast  and  slow 
corresponds  to  the  intuitive  classification  of  many  natural  sounds  and  music  into  transient  and 
sustained,  or  into  stop  consonants  and  continuents  in  speech. 

Finally,  an  overall  view  of  the  AI  representation  of  a  dynamic  spectrum  can  be  summarized 
as  follows.  First,  AI  units  with  a  wide  range  of  'RT'h  generate  a  multiscale  representation 
of  the  spectrum.  Next,  the  dynamic  responses  of  each  unit  are  effectively  “differentiated”  in 
time  by  the  XTt,  possibly  with  different  degrees  of  temporal  resolution.  This  combined  spectro- 
temporal  decomposition  is  remarkably  similar  in  spirit  to  the  common  practice  in  engineering 
systems  (such  as  speech  recognition  systems)  (Rabiner  and  Schafer  1979)  in  which  speech 
spectra  are  represented  in  terms  of  cepstral  coefficients  and  their  temporal  derivatives  (the 
differential  cepstral  coefficients).  As  discussed  in  more  detail  in  (Wang  and  Shamma,  1995), 
the  major  conceptual  difference  between  the  two  schemes  is  the  multiscale  nature  of  the  AI 
representation. 
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1  List  of  Figures 

Figure  1:  Spectrograms  of  stimuli  consisting  of  multiple  moving  ripple  spectra.  Each  moving 
ripple  is  characterized  by  its  ripple  frequency  (fi),  initial  phase  ($),  and  its  velocity  (a;). 

(A) :  Spectrogram  of  a  stimulus  consisting  of  two  moving  ripples:  Q  =  0.8  cycles/octave,  u  =  A 

Hz,  $  =  7r/6  added  to  another  ripple  with  0  =  0.8  cycles/octave,  u!  =  S  Hz,  and  $  =  tt. 
The  spectrogram  of  the  resulting  combination  ripple  is  sinusoidal  at  every  instant,  with 
Q,  =  0.8  cycles/octave  as  illustrated  by  the  upper  cross-section.  The  time-course  of  each 
spectral  component  is  demonstrated  by  the  cross-section  to  the  right,  which  is  the  sum 
of  two  sinusoidal  waveforms  at  4  and  8  Hz. 

(B) :  Spectrogram  of  the  addition  of  five  ripples  with  random  phases.  The  constituent  ripples 

are:  =  0.4, 0.8, 1.2, 1.6, 2.0  cyc/oct,  u)  =  4,8,12,16,20  Hz,  $  =  0,30,41,-101,25 

degrees. 

Figure  2:  Measuring  the  ITZ  of  a  unit  through  its  temporal  transfer  function  {Tq^{uj)).  Re¬ 
sponses  are  measured  over  a  range  of  temporal  frequencies  u  with  a  fixed  ripple  frequency 
f2o  (the  characteristic  ripple  of  the  unit)  to  determine  the  temporal  transfer  function 
Tq^{uj).  The  magnitude  |Tn„(6j)|  and  unwrapped  phase  $no(^)  of  the  transfer  function 
are  shown  in  the  top  plots.  A  straight  line  fit  to  the  phase  function,  and  its  intercept 
'^*no(0))  ^^so  shown.  The  phase  function  is  then  adjusted  (see  text  for  details)  and 
the  Tq^{uj)  is  inverse  Fourier  transformed  to  determine  the  impulse  response  function  ITZ 
shown  in  the  bottom  plot. 

Figure  3:  Measuring  the  TIT  of  a  unit  through  its  ripple  transfer  function  {Tu^{Q)).  Re¬ 
sponses  are  measured  over  a  range  of  ripple  frequencies  with  a  fixed  ripple  velocity 
Um  (the  velocity  of  maximum  response)  to  determine  the  ripple  transfer  function 
The  magnitude  |T(^„(f2)|  and  unwrapped  phase  $a;™(n)  of  the  transfer  function  are  shown 
in  the  top  plots.  A  straight  line  fit  to  the  phase  function,  and  its  intercept  (0),  are  also 
shown.  The  phase  function  is  then  adjusted  (see  text  for  details)  and  IL^(fl)  is  inverse 
Fourier  transformed  to  determine  the  response  field  TtT  shown  in  the  bottom  plot. 

Figure  4:  Predicting  the  final  response  to  multiple  moving  ripple  stimuli. 

(A) :  Spectrogram  of  the  stimulus  {S{x,t)),  along  with  its  ripple  content  (three  moving  ripples 

in  this  case,  all  at  $  =  0).  The  gray-scale  indicates  relative  amplitude  of  the  spectrogram. 

(B) :  of  the  cell  (measured  as  described  in  Fig.3)  with  BF  =  3  kHz.  The  function  is  plotted 

sideways,  i.e.,  aligned  to  the  logarithmic  frequency  axis  of  the  spectrogram  (which  also 
represents  the  tonotopic  axis). 

(C) :  Product  of  the  stimulus  spectrogram  and  its  KT  generates  a  time  function  which  is  the 

response  of  the  unit  due  to  the  TiF  alone. 

(D) :  The  X'R,{U)  of  the  cell  (measured  as  described  in  Fig.2).  The  ITl  is  convolved  with  the 

function  in  (C)  to  produce  the  final  response  shown  in  (E). 

(E) :  The  final  predicted  response  of  the  cell  (thick  solid  line)  superimposed  on  the  measured 

spike  count  (thin  solid  line  with  error  bars).  The  error-bars  (indicating  mean  -f-/-  SD)  for 


1 


the  measured  response  curve  and  the  correlation  between  measured  and  rectified  predicted 
responses  are  also  shown.  The  dashed  line  is  the  zero  spike  count.  All  abscissas  measure 
time  in  seconds;  y-axes  labeled  on  the  left  are  normalized  spike  counts.  Y-axes  on  the 
right  side  indicate  actual  spike  count.  Arrows  indicate  the  location  of  t  =  0  of  the  periodic 
functions  in  (C-E),  relative  to  the  corresponding  period  of  the  stimulus. 

Figure  5;  Examples  of  responses  to  pairs  of  ripples  from  three  cells.  All  details  of  the  plots 
and  computations  are  as  in  Fig. 4. 

Figure  6:  Examples  of  responses  of  a  unit  to  three  combination  ripple  stimuli:  (top  row)  two 
ripples  ,  (middle  row)  three  ripples  and  (bottom  row)  four  ripples.  All  details  of  the  plots 
and  computations  are  as  in  Fig. 4. 

Figure  7: 

(A) :  Example  of  responses  of  a  unit  to  two-  and  four-ripple  stimuli.  The  unit  has  a  strongly 

inverting  ITZ  function. 

(B)  Uncovering  the  half-wave  rectified  portion  of  the  responses.  The  two  stimuli  illustrated 
are  inverted  relative  to  each  other,  and  hence  their  responses  are  also  inverted.  The  mea¬ 
sured  and  predicted  responses  are  indicated  as  described  in  Fig.4  before.  The  measured 
responses  are  also  used  to  construct  the  “non-rectified”  version  of  the  response,  depicted 
by  the  dashed  curve  in  panel  (E)  of  the  bottom  row.  All  other  details  of  the  plots  and 
computations  are  as  in  Fig.4. 

Figure  8:  Histogram  of  the  correlation  coefficients  between  measured  and  predicted  responses 
to  stimuli  with  two,  three  or  four  moving  ripples. 

Figure  9:  Examples  of  responses  to  complex  combinations  of  ripples.  Three  complex  combi¬ 
nations  are  presented  to  the  same  unit:  (Top  row)  simulated  FM  stimulus  composed  of 
five  ripples  to  produce  effectively  a  single  peak  with  velocity  20  oct/sec.  (Middle  row) 
temporal  noise  stimulus  generated  with  fifteen  random  phase  moving  ripples  all  at  SI  =  1.2 
cycles/ octave.  (Bottom  row)  ripple  noise  stimulus  generated  by  adding  five  ripples  with 
different  S^’s  and  random  phases,  all  moving  at  the  same  velocity.  All  other  details  of  the 
plots  and  computations  are  as  in  Fig.4. 

Figure  10:  Responses  of  three  different  cells  to  three  complex  combinations.  In  all  cases,  the 
XTl  plays  a  key  role  in  shaping  the  final  responses  of  the  cells.  The  stimuli  used  are:  (Top 
row)  temporal  noise  composed  of  five  ripples.  (Middle  row)  a  simulated  FM  stimulus 
effectively  traveling  at  20  oct/sec.  (Bottom  row)  a  simulated  FM  stimulus  effectively 
traveling  at  50  oct/sec.  All  other  details  of  the  plots  and  computations  are  as  in  Fig.4. 

Figure  11:  Histogram  of  the  correlation  coefficients  between  measured  and  predicted  responses 
to  complex  ripple  stimuli  with  five  or  more  moving  ripples. 
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